
09/719514 
526^JiJPCT/PT0 13 DEC 2000 



SEPCIFICATION 



VOICE RECOGNITION DEVICE FOR TOYS 



BACKGROUND OF THE INVENTION 



Field of the Invention 

The present invention relates to a voice 



a number of unspecified people. 

Description of Related Art 

In conventional voice recognition devices for toys, 
a voice recognition device for toys for recognizing a 
specific person is designed to recognize words spoken 
by only one person, and the voice of the speaker has to 



be registered in a RAM or a ROM before he or she actually 
uses the toy. Although the recognition rate of the ROM 
is not bad, there exist the following problems; no other 
people than the registered person can use the toy, the 
registration of a use is required, and the registered 
voice is lost once the power is down. The voice 
recognition device is not suitable for use in toys in 
particular for very young children. Among the problems 
it is a critical problem that the voice recognition device 
allows only one person to use it, and therefore, the 
device limits the application thereof. 



recognition device for toys for recognizing voices of 
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On the other hand, a voice recognition device for 
recognizing voices of unspecified people is designed to 
recognize the voice of any person, and no registration 
is required before the device is actually used. However, 
it is required to input voices of a number of people in 
a ROM in advance, and the initial production of voice 
data requires difficult work. In addition, as the number 
of words or speeches to be recognized increases, the work 
gets more complicated and the capacity of the memory for 
storing the increasing data has to be extended, high 
production costs thereby resulting. Japanese Examined 
Patent Publication No. 2-39798 discloses a related 
conventional example. In this conventional example, the 
length of an inputted voiced word is measured, and when 
the measured length is determined to coincide with the 
length of the word determined by a voice registration 
switch, a voice is outputted. However, in a case where 
the length of a word is measured, there occur continuous 
malfunctions in a noisy place, and it is found that the 
device is not totally suitable for practical use. 

In addition, although the device is designed to 
recognize words or speeches of unspecified people, it 
only can recognize in the order of ten to twenty words 
or speeches, and the device cannot recognize every word 
people speak. Due to this, the user has to consult with 



an owner' s manual every time he or she wants to know what 
type of voice can be recognized, and in this sense the 
voice recognition device which has to originally be 
convenient is not convenient. 

An object of the present invention is to provide 
a voice recognition device for recognizing voices of a 
number of unspecified people using a microcomputer or 
a voice synthesis IC, wherein the length of a pause or 
pauses of two or more words is measured, whereby the 
voices are recognized. 

Another object of the present invention is to 
provide a voice recognition device for recognizing 
voices of a number of unspecified people, wherein the 
length in time of a word spoken by a speaker for 
recognition is measured, whereby the voice is 
r ecogni zed . 

A further object of the present invention is to 
provide a voice recognition device for recognizing 
voices of a number of unspecified people, wherein the 
length in time of a word spoken by a speaker is compared 
with the length in time of a corresponding voice 
synthesized word, and in the event that the result of 
the comparison falls within a predetermined tolerance, 
the word spoken by the speaker is recognized, whereby 
the recognition of the word is effected. 
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SUMMARY OF THE INVENTION 
According to a first aspect of the present invention, 
there is provided a voice recognition device for toys 
comprising a storage means for measuring the length in 
time of a combination of two or more continuous words 
or expressions and the length in time of a pause or pauses 
between the words or expressions and then storing a 
measured value in advance, a control means for measuring 
the length in time of a word or expression spoxen by a 
speaxer, comparing a measured value with the measured 
value stored in the storage means, and recognizing the 
word or expression of the speaker in the event that the 
result of the comparison falls within a predetermined 
tolerance and an output means for outputting the result 
of the recognition so carried out. 

According to the configuration, in addition to the 
recognition of the meaning of one word or expression, 
the meaning of a set of two or more words or expressions 
which are continuously spoxen can be recognized from a 
combination of the two or more words or expressions and 
a pause or pauses between the words or phrases when they 
«. so spoxen. For example, in recognition of a 
combination of two expressions, when having recognized 
a combination of "Konnichiwa .hello," and Mi tenxi desu 
(it is a good weather,," the first expression and the 



second expression are recognized together with a pause 
therebetween which should be provided when they are 
spoken, whereby the meaning of the coronation of the 
two expressions, "Konnichiwa, 11 tenfci desu (Hello, it 
is a good weather)" can be recognized. when people 
express, some people speak fast and others slowly, and 
therefore, one expression is made to be recognized in 
two ways, short and long. When first and second 
expressions are made to be recognized in two ways; short 
and long, respectively, the recognition of the two 
expressions can be made in four ways. Then, when a pause 
is provided between the first and second expressions, 
the pause is also made to be recognized in two ways; short 
and long, and as a result of this, the combination of 
the expressions can be recognized in eight ways, whereby 
a voice recognition device for toys with higher accuracy 

can be provided. 

Thus, in the event that the length of either or any 
of two or more words or expressions differs from those 
stored in the storage means or the length of the pause 

„ of the two or more words or expressions 
between any two of the two o 

di££ ers from that stored in the storage means, since the 
control means does not recognize the combination, there 
occurs no malfunction or mal-recognition. 

In addition, according to a second aspect of the 



present invention, there is provided a voice recognition 
device for toys comprising a storage means for measuring 
the length in time of a word or expression spoken by a 
speaker for recognition and then storing a measured value 
in advance, a control means for measuring the length in 
time of a word or expression spoken by a speaker, 
comparing a measured value with the measured value stored 
in the storage means and recognizing the word or 
expression of the speaker in the event that the result 
of the comparison falls within a predetermined tolerance 
and an output means for outputting in voice the result 
of the recognition so carried out. 

According to the configuration, since the voice 
recognition device is designed for use in toys for 
children, when a child as a player speaks to the voice 
recognition device, the device measures the length in 
time of the word or expression of the speaker, recognizes 
the word or expression of the speaker in the event that 
the result of the comparison falls within a predetermined 
tolerance, and outputs in voice the result of the 
recognition via a device main body. For instance, in the 

calls the name of the cat toy, "Tama", it answers the 
piayer by mewing. Thus, according to the present 
invention, an interactive voice recognition toy like one 



described above can be provided. 

Furthermore, according to a third aspect of the 
invention, there is provided a voice recognition device 
f or toys comprising a storage means for storing the length 
in time of a voice synthesized word or egression rn 
advance, an output means for outputtin, the voice 
synthesized word or expression and a control means for 
measuring the length in time of a word or expression 
spo.en by a speaker, comparing a measured value with the 
iength in time of the voice synthesized word or expression 
stored in the storage means, recognizing the word or 
expression of the speaker in the event that the result 
o£ the comparison falls within a predetermined tolerance 
.nd an outputting means for outputting the result of the 
recognition . 

wording to the configuration, the conversation 
uit hamachine UC, can be realized by ma k ing an 1C execute 
b oth voice synthesis and voice recognition, and moreover 
the conversation with the machine can be realized at 

, low cost For example, in a case where an 
extremely low cost. 

egression, "ohayo ( good morning, " is voice synthesized, 
if th e length of an expression spo.en by a speaker for 

falls within a predetermined tolerance of 

recognition falls wx 

frrnod morning j , 
the voice synthesized expression, ohayo (g 

recoqnized. The voice 
the spoken expression can be recogn 



synthesized expression "ohayo (good morning," is 
provided with a predetermined tolerance in length; short 
and long, and therefore even if the expression is spoken 
fast or slowly, the length of the spoken expression falls 
within the predetermined tolerance, the expression can 

be recognized. 

Furthermore, according to a fourth aspect of the 
present invention, there is provided a voice recognition 
device for toys as set forth in the third aspect above, 
comprising a control means for measuring the length in 
time of a word or expression spoken by a speaker whrch 
corresponds to the outputted voice synthesized word or 
expression, comparing a measured value with the length 
in time of the voice synthesized word or expression which 
is stored in the storage means and recognizing the spoken 

■f 4-ho cnpaker in the event that the 
word or expression of the speaker x 

result of the comparison falls within a predetermined 
tolerance, and an outputting means for outputting the 

recognized result. 

A ccording to the configuration, the voice 

e +-b<3 lenath in time of the 
recognition device measures the lengt 

„ord or expression spoken by the speaker which 
corresponds to the outputted voice synthesized word or 
expression and recognizes the word or expression by the 
speaker provided that the measured value falls within 



th e predetermined tolerance. Namely, the pl.Y" can 
enjoy a q uiz b y i»,in, a word or expression in 
association with the outputted voice synthesized word 
or expression. .or exampie, here is a voice synthesized 
q uestion, "what is the hi 9 hest mountain in .span," and 

XXM4 . T7n-ii " the answer 
if the player or speaker answers Mt . Fu D x, 

is correct and is then recognized. 

In addition, according to a fifth aspect of the 
,. nn th ere is provided a voice recognition 
present invention, there i& y 

of the present invention, wherein the stora.e means 
sto res the len g th in time of a com.ination of the len.th 
in time of the voice synthesized words or expressions 

n f * cause between the words or 
and the length in time of a pause d 

• advance wherein the control means 
expressions m advance, 

m easures the lenoth in time of the pause hetween the words 
or expressions and the len.th in time of words or 
sessions spoken hy the speaker, compares measured 
value s with the ien g th in time of the domination of the 

and the len g th in time of the words or expressions spoken 
by the speaker for reco g nition and resizes the wor 
or expressions b y the speaker provided that the resu 
o£ the comparison falls within the predetermined 

tolerance . 



according to the co.a^.tio, since the len g th 
in tiM of the c-,— of the ien g th o £ a fian* ti^e 
slnce th . voice synthesized words or egressions have 

b een outputted «.«» - -«-. ~ — = ^ * 

th . speaker for recognition are expressed and the length 
in tim e o t the words or expressions by the speaker, a 

less errors can be attained, 
voice recognition with less er 

BRIEF DESCRIPTION OF THE DRAWINGS 
Fig 1 18 a diagram showing a principle according 
to the present inventions .easing the length in t im e 

o£ a word or expression, 

Fig 2 is a diagra, showing another princrpie 
accord to the present invention for measuring the 
lengt h in ti*e of a word or expres sion 

Fig 3 is a diagram showing a further p 
according to the present invention for — «» 
iength in time of a word or expression, 

Fig . 4 is a diagram showing the configuration of 
in the present invention, and 

" th „ con£ig uration 

Fig . 5 is a diagrar. showing another 

of hardware for use in the present invention. 

BEST mode for out the ihvmtioi 

■ ,o describing the present invention rn 
With a view to descnu 
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s 
an 



neater detail, a best mode for carrying out the invention 
uill be described below with reference to the 

accompanying drawings. 

Fig. 1 is a diagram showing a principle according 
to the present invention for measuring the length in time 
of a word or expression. Reference character A denote- 
the length of a word or expression, and for example, a 
expression, "fconnichiwa .hello," and an expression 
-xonbanwa (good evening. » are an expression of five 
Japanese Hiragana characters or five syllables and the 
xength or the number of characters or syllables of the 
two expressions is the same. Reference numeral C also 
denotes the length of an expression such as »ii ten,, 
oesu <it is a good weather," or -oxaimono desu (I'm doing 
the shopping," of seven Japanese Hiragana characters or 
syllables and therefore the length of the two expressions 
is identical. Reference character B is the length of a 
pause between the expression A and the expression C and 
denotes the length of. ■ interposed between the 

expression "Hello" and -If. a good weather" when the 

speaker expresses. "Hello «' s a good weather. » 

Thus , the two expressions are expressed 

. ^ , hprPO f is recognized by 
continuously, and the meanrng thereof 

th . combination of the expressions and the pause 
therebetween. Consequently, the recognition rs 
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-nt- that the length of a combination 
effected in the event that tn 

•on A the length of the pause B and the 
of the expression A, the y 

session C falls within - ' °* ^ 

o£ . set expression, additionally, the recognition is 
effected even if a — of - expression « » 

• „ c or a combination of the pause B and the 
the expression C or a com 

•o„ c falls within the tolerance of the length 
expression C falls i v 
o£ the set expression. The recognitions are totally 
processed by a m icroco m pute, which .111 be -scribed 



later 



according to the present invention for .easing the 
length in time of a word or expression. Serene 
character , denotes the length of a voice synthesize 
expression produced when a voice synthesized expression 

ed as an IC output such as an expression "ohayo 
■i cs expressed as an j.^ r 
good m orni„g>» of -r.apaneseKiragana characters o 

llables. m the figure, a lower line denotes a vol 
1 put, and reference character, is the length of a wor 

„ t.c the voice synthesized 
or expression corresponding to 

d by the speaker is recognized. Keference 

thpn the sound by tne *y 
h acter B denotes a pause between the voice synthesized 

Til synthesized sound is outputted until the voice 
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is inputted for recognition and with this pause a voice 

ln th . £ o«r case, the voice recognition is ejected 
bas ed on the length * o £ the sound only, whereas in the 
latter case, the voice reception is effected based on 
th e combination of the length B of the pause and the length 

^ , hprP fore the recognition capability 
A of the sound, and therefore tne 

C a„ be increased with the iatter case, rurthermore. it 
ls possible to make the speaker imagine a word or 

initial voice synthesized sound 
expression following an initial vox 

„hen the iatter is expressed. ror example, when a 
q uestion -.hat is the highest mountain in aapan," 
in voice synthesized sounds V. « - speaker answers 
„ Mt Fuj i» as sounds produced by the speaker, then the 
answer is recognised. X„ this case, the speaker can 
operate the voice recognition device without consults 
the owner's manual every time. 

Flg 3 is a diagram showing a further principle 
according to the present invention for measuring the 
en gth in time of a word or egression. Terence 

expression expressed by the speaker and is the length 
£ a word^a .the name of a cat, "of two Japanese 

,,lables in the figure, a lower 
Katakana characters or syllables 
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character V denotes a voice synthesized ^ produce 
; -o. or e^ Sl o n P-c. through - 

h i is e XP ressedas an « ^t, — there 

synthesis is exp^ « mow " is 

fK 0 ci 7 ed voice N> mew is 
4- "T^ma " a synthesizea 
ls a voice input Tama, In this case , the 

expressed in reply to the voice input , -t 

voice input fy the spea.e, is compared with 

„ „art and when the result 
value stored in the storage part, 

o£ the comparison falls" and the result 

the «ord of the speaker is recocted, and 
o£ the recognition is outputted in voice. 

m 4 is a diagram showing the configuration of 
f or use in the present invention. mthis 

on t a microcomputer is 
emhodiment, 

cognition is carried out. n „ plifl . r2 . 

v, > microphone 1 is amplified a 

conver ted into a digital 
thereafter an analog signal c igital slgna i 

si^al at an integrating cirou t - n 

so parted is then inpu m ^ ^ 

The microcomputer . com - ^ ^ 

~ f two or iuoj-c 
T. T operating part for 

ot egression oy P ^ pted etermined 

the word or egression ^ ^ 

of the recognition. Thus, 
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router 4 is controlled at the control 
in to the -croco,puter 

pa rt. theater the control signal » 
operating part, and an operating process is carried 
at the operating part to see if the control srgnal 
t he f irst word or expression stored in the storage part 
can b e recognized as the word or expression of the spea.e - 

determined to he recognized as the word or express. 

a result of the operating process, an 
of the speaker as a result 

• The speaker can recognize 

. llVl = illuminated. me = f 
LED or bulb is ui 

. has b een recognized through the 
that the first word has been 

.^nation of the LE 0 or b ulh and ti.es the rnpu o< 

ft similar operating process to that carried out fo t 

^ second word is 

£itst word is petfood, and -hen the 

ani zed then the control part output 
deterged to b e recogn.zed, 
an electric signal for driving a motor 5, 

b li„ k ing a hul. , or activating an electromagnet,, 
s legs eyes or mouth of a stuffed toy 
or doll can oe activated, and at 
conversation with the toy can b e realize d. 

Fig . 5 is a diagram showing the confrguratro 

for use in the present invention, 
-other hardware for 

recognition, formally, an inexpensive 
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.microcomputer can be used £or a voice recognition 
microcomputer. However, in this embodiment, in order to 
further reduce the production cost, a norma! voice 
synthesis IC is programmed for use for this purpose. 
When a switch SW of a main body is closed, the voice 
synthesis IC 4 controls such that a voice synthesized 
sound is amplified at an amplifier 8 and that the voice 
synthesis sound so amplified is then outputted through 
a speaker 9. When the output of the voice synthesized 
sound is completed, the LED or bulb is illuminated. The 

^-f ^ word or expression 
user times the expression of a 

corresponding to the voice synthesized sound through the 
aerophone 1 before the LED or bulb is switched off . The 
use r speaks a word or expression corresponding to the 
,oice synthesized sound through the microphone 1 when 
he or she hears the voice synthesized word or expression. 

corresponding to the voice synthesized sound 
instantaneously he or she hears the voice synthesized 
sound, or the user may answer the voice synthesized sound 
after a certain pause since the voice synthesized sound 
is outputted. These operations are totally processed 
th rough the program. The voice signal passing through 
the microphone 1 is amplified at an amplifier 2. and 
thereafter an analog signal is converted into a digital 
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signal at an integrating circuit 3, the digital .i,n.l 
so converted being then inputted into tne voice synthesis 
IC »hen the word or expression having a length 
corresponding to the voice synthesized sound is inputted 
in to the voice synthesis IC. the result of the voice 
recognition by the program is outputted in another voice, 

* <, is driven, the bulb 6 is illuminated 
whereby a motor 5 is driven, 

~~-h i is activated. Thus, 
or blinked or the electromagnet 7 is acti 

»h „f a doll can be activated 
the arms, legs, eyes cr mouth of a doll 

^ t-hp same a conversation 
through a voiced order, and at the same 

with the toy can be realized. 

according to the present invention, it is possible 
to repeat by using the voice synthesis XC a process in 
uhi ch an answer is given in response to a guestion from 
th e computer, and this assumes a real conversation made 
b etween human beings, whereby the user can express his 
or her wishes in a seguential fashion. Finally, it is 
possible to make the microcomputer or voice synthesis 

to thereby make it follow 
IC to recognize many thmgs to thereby 

orders from the user. 

IDUSTRIAL APPLICABILITY 
Rs has been described heretofore, according to the 
present invention, where the system of the invention is 
pro grammed in the microcomputer or voice synthesis XC. 
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a voice recognition device can be provided which is simple 
in configuration and inexpensive in cost, whereby a 
conversation with the computer can be realized. In 
addition, according to the present invention, the 
recognition accuracy can be increased by combining a 
plurality of words or expressions and limiting the length 
of each word or expression and the length in time of the 
pause between the words or expressions. Moreover, 
according to the present invention, in a case where the 
voice synthesis IC is used for voice recognition, it is 
possible to mate the user to imagine the contents of a 
speech by the user or to make the voice recognition device 
speak the contents of a guidance, this obviating the 
necessity of an owner's manual on how to use the voice 
recognition device. Furthermore, according to the 
pKse „t invention, it is possible to make the 
microcomputer output in response to the result of a voice 
recognition for synchronization of actions other than 
a conversation. In addition, since the recognition is 
carried out based on the length of a sound, the voice 
of any person can be recognized irrespective of sex. age 
or the like of the speaker. Additionally, since only 
data on the lengths of sounds are programmed, the memory 
capacity of the system can be reduced extremely, whereby 
a low priced product can be provided. In particular, in 
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a case where a voice synthesis IC is used, an extremely 
low priced product can be provided. In addition, 
according to the present invention, no voice 
registration is required at all before use, and therefore 
the voice recognition device according to the present 
invention can be used just after it is switched on. 
According to the present invention, although voices of 
a number of unspecified people can be recognized, no voice 
data does not have to be collected. Furthermore, 
according to the present invention, the voice 
recognition device thereof is small in size and consumes 
very little power, and therefore a voice recognition 
device can be produced which is small in size and uses 
a small battery, a certain effectiveness in economy being 
thereby exhibited. 
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